1 Map of Sampling Stations

Load package:

library(marmap)

Import bathymetry data from NOAA:

b = getNOAA.bathy(lon1 = 115, lon2 = 140, lat1 = 35, lat2 = 20, resolution = 1)

Import dataframe with the latitude and longitude of sampling points (in decimal degrees):

pts1 <- read.csv("~/desktop/MiraiDPI/SamplingLatLong.csv")
pts1$Station<- as.character(pts1$Station)

pts_photos <-  read.csv("~/desktop/MiraiDPI/SamplingLatLong2.csv")
pts_photos$Station<- as.character(pts_photos$Station)

pts_acanths <-  read.csv("~/desktop/MiraiDPI/SamplingLatLong3.csv")
pts_acanths$Station<- as.character(pts_acanths$Station)

Make dataframe with map labels and their lat/long:

Clon = c(127.5,131.25, 121.6, 121.03, 120.8, 126)
Clat= c(26, 31.93, 31.1, 23, 27.2, 22)
CLabels= c("Okinawa", "Japan", "China", "Taiwan", "East China Sea", "Philippine Sea")
Countries = cbind.data.frame(Clon, Clat, CLabels)
str(Countries)

Make plot one layer at a time:

#define plot colors
blues <- c("lightsteelblue4", "lightsteelblue3", "lightsteelblue2", "lightsteelblue1")
greys <- c(grey(0.6), grey(0.93), grey(0.99))
#tiff("~/desktop/MiraiDPI/MiraiFiltersMap.tiff", height= 8, width =8, units = 'in', res=300) # uncomment to save plot to file
plot(b, image = TRUE, land = TRUE, xlim=c(120,135), ylim=c(22, 32), lwd = 0.03, bpal = list(c(0, max(b), greys), c(min(b),0 , blues)), cex.lab=1.5, cex.axis=1.5, cex.main=1.5, cex.sub=1.5) #bathymetry colors
plot(b, lwd = 0.8, deep = 0, shallow = 0, step = 0, add = TRUE) # highlight coastline with solid black line
plot(b, deep=-200, shallow=-200, lwd = 0.4, drawlabels=T, add=T)  # Add -200m isobath
plot(b, deep=-1000, shallow=-1000, lwd = 0.4, drawlabels=T, add=T)  # Add -1000m isobath
plot(b, deep=-2000, shallow=-2000, lwd = 0.4, drawlabels=T, add=T)  # Add -2000m isobath
plot(b, deep=-3000, shallow=-3000, lwd = 0.4, drawlabels=T, add=T)  # Add -3000m isobath
points(pts1, pch = 21, bg = 'white', col = 'red') # Add sampling points
points(pts_photos, pch = 19, col = 'red')  #Add points for stations with photo profiles
text(pts1, labels=pts1$Station, adj= 1.5, cex=1.2) # Add sampling station numbers
text(pts_photos, labels=pts_photos$Station, adj= 1.5, cex=1.2) # add photo station numbers
text(Countries, labels=Countries$CLabels, cex= 1.7, adj = -0.2) # Add map labels 
#dev.off() #uncomment to save plot to file

2 High-throughput sequence analysis

2.1 Identify and Count Amplicon Sequence Variants (ASVs)

Sample collection, DNA extraction, PCR, & sequencing

Seawater collected from the Subsurface Chlorophyll Maximum (SCM), mid-water column, and near-bottom was collected in 10 L Niskin bottles at each sampling station and surface seawater was collected by bucket. Five Liter replicates from SCM, mid, and near-bottom water samples and 4.5 L replicates from surface samples were sequentially filtered through 10.0 and 0.2 µm pore-size Polytetrafluoroethylene (PTFE) filters. Filters were flash frozen in liquid nitrogen and stored at -80 ºC until DNA extraction. DNA was extracted following the manufacturers protocols for the Qiagen/MoBio DNeasy PowerWater Kit, including the optional heating step to lyse cells. Amplicon PCR and sequencing library preparation followed the procedures described in the Illumina 16S Metagenomic Sequencing Library Preparation manual modified only to include universal eukaryotic primers for the v4 region of the 18S rRNA gene (F: Stoeck et al. 2010, R: Brisbin et al. 2018) and amplicon PCR conditions most appropriate for these primers. The 220 samples were radomly assigned to 4 sequencing runs on the Illumina MiSeq sequencing platform with v3 chemistry for 300x300-bp paired-end sequencing.

Sequences are available from the NCBI Sequencing Read Archive (SRA) with accession PRJNA546472.

Bioinformatic processing with Qiime2

Demultiplexed paired-end sequences provided by the OIST sequencing center were imported to Qiime2 (v2018.11) software (Bolyen et al. 2018). The Divisive Amplicon Denoising Algorithm (DADA) was implemented with the DADA2 plug-in for Qiime2 for quality filtering, chimera removal, and feature table construction (Callahan et al. 2015) separately for each sequencing run before merging the four resulting feature tables. Taxonomy was assigned to Amplicon Sequence Variants (ASVs) in the feature table by training a classifier on the Protist Ribosomal Reference (PR2) database to classify ASVs (Guillo et al. 2012). The feature table and assigned taxonomy were then exported from Qiime2 for further analysis.

Below is an example Qiime2 workflow for 1 sequencing run:

Activate an anaconda environment for qiime2: source activate qiime2-2018.11

Import sequencing data (all sequences for eacg sequencing run must be in their own directory): qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path ./seqs/seqrun1/ --input-format CasavaOneEightSingleLanePerSampleDirFmt --output-path run1_demux-paired-end.qza

Visualize squencing results (use these graphs to decide what length to trim sequences) qiime demux summarize --i-data run1_demux-paired-end.qza --o-visualization run1_demux.qzv

Run the DADA2 algorithm: qiime dada2 denoise-paired --i-demultiplexed-seqs run1_demux-paired-end.qza --output-dir ./dd2run1 --o-representative-sequences run1_rep-seqs --p-trim-left-f 10 --p-trim-left-r 10 --p-trunc-len-f 260 --p-trunc-len-r 250 --p-n-threads 3

Check results: qiime feature-table summarize --i-table ./dd2run1/table.qza --o-visualization ./dd2run1/table.qzv --m-sample-metadata-file ~/desktop/MiraiFilters/sampleMaptxt.txt

qiime metadata tabulate --m-input-file dd2run1/denoising_stats.qza --o-visualization dd2run1/denoising_stats.qzv

Merge feature tables from multiple sequencing runs: qiime feature-table merge --i-tables ./dd2run1/table.qza --i-tables ./dd2run2/table.qza --i-tables ./dd2run3/table.qza --i-tables ./dd2run4/table.qza --o-merged-table mergedtable.qza

qiime feature-table merge-seqs --i-data run1_rep-seqs.qza --i-data run2_rep-seqs.qza --i-data run3_rep-seqs.qza --i-data run4_rep-seqs.qza --o-merged-data merged-rep-seqs.qza

Import taxonomy database: qiime tools import --type 'FeatureData[Sequence]' --input-path pr2_version_4.11.1_mothur.fasta --output-path pr2.qza

qiime tools import --type 'FeatureData[Taxonomy]' --input-format HeaderlessTSVTaxonomyFormat --input-path pr2_version_4.11.1_mothur.tax -output-path pr2_tax.qza

Select v4 region from PR2 sequences: qiime feature-classifier extract-reads --i-sequences pr2.qza --p-f-primer CCAGCASCYGCGGTAATTCC --p-r-primer ACTTTCGTTCTTGATYRA --o-reads pr2_v4.qza

Train the classifier: qiime feature-classifier fit-classifier-naive-bayes --i-reference-reads pr2_v4.qza --i-reference-taxonomy pr2_tax.qza --o-classifier pr2_v4_classifier.qza

Classify: qiime feature-classifier classify-sklearn --i-classifier pr2_v4_classifier.qza --i-reads merged-rep-seqs.qza --o-classification taxonomy.qza

qiime metadata tabulate --m-input-file taxonomy.qza --o-visualization taxonomy.qzv

Export the .tsv taxonomy file from taxonomy.qzv for use in following steps.

2.2 Load data into R & prepare phyloseq objects

Load packages:

library(tidyverse)
#devtools::install_github("jbisanz/qiime2R")
library(qiime2R)
library('phyloseq')
library('vegan')
library('ggplot2')
library(RColorBrewer)
library(DESeq2)
library(reshape)
library("wesanderson")
library("gridExtra")
library("ggpubr", lib.loc="~/Library/R/3.5/library")
library(Hmisc)

Set default colors:

cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
P <- brewer.pal(12, "Paired")
S2 <- brewer.pal(8, "Set2")
S1 <- brewer.pal(8, "Set1")
D2 <- brewer.pal(8, "Dark2")
colors<- c(P, S2, S1,D2)
colors2 <- rep(colors, 6)

ap<- c("#F1B6A1", "#D4A52A", "#E3E5DB", "#A5CFCC", "#0E899F", "#A83860", "#ED91BC", "#DB5339", "#F58851", "#42465C", "#1E479A", "#F7CDA4", "#CF529C", "#11638C")

Dj<- wes_palette("Darjeeling2")
Cv <- wes_palette("Cavalcanti1")
Gb<- wes_palette("GrandBudapest1")
wpcolor<- c(Dj[2], Dj[4], Cv[2], Cv[4], Gb[2], Cv[5])

#function for drawing legends separately from plots
g_legend<-function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  legend
}

Convert Qiime2 artifacts to phyloseq objects:

phyloseq<-qza_to_phyloseq(features="~/desktop/MiraiFilters/mergedtable.qza")
## Warning in read_qza(features): Artifact was not generated with Qiime2 2018.4,
## if data is not successfully imported, please report here github.com/jbisanz/
## qiime2R/issues

Import sample data:

metatable <- read.csv("~/desktop/MiraiFilters/SampleMap.csv")
row.names(metatable) <- metatable[[1]]

metatable$Station <- as.character(metatable$Station)
metatable$filter <-as.factor(metatable$filter)

META<- sample_data(metatable)
str(META)
## 'data.frame':    212 obs. of  32 variables:
## Formal class 'sample_data' [package "phyloseq"] with 4 slots
##   ..@ .Data    :List of 32
##   .. ..$ : Factor w/ 212 levels "A6-F173-DNA",..: 3 6 9 12 96 97 115 118 129 134 ...
##   .. ..$ : Factor w/ 212 levels "A6-F173-DNA",..: 3 6 9 12 96 97 115 118 129 134 ...
##   .. ..$ : chr  "2" "2" "2" "2" ...
##   .. ..$ : Factor w/ 8 levels "1","31","32",..: 7 7 8 8 1 1 4 4 6 6 ...
##   .. ..$ : Factor w/ 14 levels "C02","C03","C04",..: 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ : Factor w/ 4 levels "Bottom","Mid",..: 4 4 4 4 1 1 2 2 3 3 ...
##   .. ..$ : Factor w/ 5 levels "Bottom","Deep",..: 5 5 5 5 1 1 3 3 4 4 ...
##   .. ..$ : Factor w/ 2 levels "D","S": 2 2 2 2 1 1 1 1 2 2 ...
##   .. ..$ : Factor w/ 109 levels "10Bott1","10Bott2",..: 71 71 72 72 65 65 67 67 69 69 ...
##   .. ..$ : int  0 0 0 0 1000 1000 700 700 92 92 ...
##   .. ..$ : Factor w/ 2 levels "2","10": 2 1 2 1 2 1 2 1 2 1 ...
##   .. ..$ : Factor w/ 2 levels "n","y": 2 2 2 2 2 2 2 2 2 2 ...
##   .. ..$ : Factor w/ 4 levels "KG","MOT","NOT",..: 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ : Factor w/ 3 levels "KG","NOT","SOT": 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ : Factor w/ 3 levels "n","p","y": 3 3 3 3 3 3 3 3 3 3 ...
##   .. ..$ : Factor w/ 4 levels "KG","NOT","O",..: 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ : Factor w/ 2 levels "n","y": 2 2 2 2 2 2 2 2 2 2 ...
##   .. ..$ : num  0 0 0 0 1056 ...
##   .. ..$ : num  25.25 25.25 25.25 25.25 4.46 ...
##   .. ..$ : num  34.7 34.7 34.7 34.7 34.4 ...
##   .. ..$ : num  202.2 202.2 202.2 202.2 69.8 ...
##   .. ..$ : num  23.1 23.1 23.1 23.1 27.3 ...
##   .. ..$ : num  0.0698 0.0698 0.0698 0.0698 0.051 ...
##   .. ..$ : num  0.061 0.061 0.061 0.061 1.107 ...
##   .. ..$ : num  0.521 0.521 0.521 0.521 11.128 ...
##   .. ..$ : num  0.06 0.06 0.06 0.06 37.51 ...
##   .. ..$ : num  0.01 0.01 0.01 0.01 0.02 0.02 0 0 0.11 0.11 ...
##   .. ..$ : num  1.05 1.05 1.05 1.05 111.04 ...
##   .. ..$ : num  0.033 0.033 0.033 0.033 2.666 ...
##   .. ..$ : num  0.05 0.05 0.05 0.05 0.03 0.03 0.03 0.03 0.04 0.04 ...
##   .. ..$ : num  26.3 26.3 26.3 26.3 26.3 ...
##   .. ..$ : num  126 126 126 126 126 ...
##   ..@ names    : chr  "X" "SampleID" "Station" "Bottle" ...
##   ..@ row.names: chr  "A8-F17-DNA" "B8-F18-DNA" "C8-F19-DNA" "D8-F20-DNA" ...
##   ..@ .S3Class : chr "data.frame"

Load PR2 taxonomy:

taxonomy <- read.csv("~/desktop/MiraiFilters/taxonomy.csv", stringsAsFactors = FALSE)
names(taxonomy) <- c("row", "tax", "Confidence")
row.names(taxonomy) <-taxonomy[[1]]
taxonomy <- taxonomy[,(-1)]
taxonomy <-  separate(taxonomy, tax, c("D0","D1", "D2", "D3", "D4", "D5", "D6", "D7", "D8", "D9", "D10", "D11", "D12", "D13", "D14"), sep = ";", fill = "right")
taxonomy <- taxonomy[,c(1:8)]
taxmat <- as.matrix(taxonomy)
TAX = tax_table(taxmat)
str(TAX)
## Formal class 'taxonomyTable' [package "phyloseq"] with 1 slot
##   ..@ .Data: chr [1:23115, 1:8] "Eukaryota" "Eukaryota" "Eukaryota" "Eukaryota" ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:23115] "000a220fe8edd771dd82f6915627ef3e" "000c79760dd16d6692018deb8ea1b164" "000d9eaa21eaff08f34fb1a1a87601d5" "000e6f109e01181b2b1eda5d6c99319c" ...
##   .. .. ..$ : chr [1:8] "D0" "D1" "D2" "D3" ...

Add taxonomy to phyloseq object:

ps = merge_phyloseq(phyloseq, TAX, META)

Some basic statistics:

#make ASV dataframe
OTUs <- data.frame(otu_table(ps))

Total number of ASVs in the data set:

OTUsRS<- OTUs
OTUsRS$RowSum <- rowSums(OTUsRS)
OTUsRSnoZero <- OTUsRS$RowSum!=0
sum(OTUsRSnoZero)
## [1] 22656

Total Number of Sequences:

sum(OTUsRS$RowSum)
## [1] 16340248

merge ASV table with taxonomy:

otusWtax <- merge(OTUs, taxonomy, by = 'row.names')

keep only Acantharian ASVs:

otusA <-  filter(otusWtax, D3 == "Acantharea") 

Number of acantharian ASVs in the data set:

OTUsRS<- otusA[,-c(1,213:220)]

OTUsRS$RowSum <- rowSums(OTUsRS)
OTUsRSnoZero <- OTUsRS$RowSum!=0
sum(OTUsRSnoZero)
## [1] 1053

Acantharian ASV Richness:

OTUs0 <- OTUsRS!=0 #is this number not a zero? true (1) or false (0)
csums <- colSums(OTUs0) # col sums = observed ASV richness
csumdf <- as.data.frame(csums)
max(csumdf$csums) #1906
## [1] 1053
min(csumdf$csums) #49
## [1] 2
mean(csumdf$csums) #730
## [1] 37.84434

how many acantharian seqs total:

csums <- colSums(OTUsRS) # col sums = observed ASV richness
csumdf <- as.data.frame(csums)
sum(csumdf$csums)
## [1] 1099858

2.3 Distance and Ordination

remove mislabeled samples

mixedup <- c("F43", "F44", "F45", "F46", "F47", "F48", "F84")
'%ni%' <- Negate('%in%')
ps<- subset_samples(ps, SampleID %ni% mixedup)

remove metazoan ASVs and seq not classified at the kindom level

ps<-subset_taxa(ps, D1 != "")
ps<-subset_taxa(ps, D1 != "NA")
ps<-subset_taxa(ps, D2 != "Metazoa")
  • compute CLR normalization, CLR = centered log-ratio, log(x/gx) where gx is the geomentric mean of vector x

2.3.1 Full Protist Community

library("CoDaSeq")
OTU4clr<- data.frame(t(data.frame(otu_table(ps))))
row.names(OTU4clr) <- gsub("\\.", "", row.names(OTU4clr))
OTUs.clr <- codaSeq.clr(OTU4clr + 0.5, samples.by.row=TRUE)
OTU2 <- otu_table(as.matrix(OTUs.clr), taxa_are_rows = FALSE)
metatableCLR <-metatable
row.names(metatableCLR) <- gsub("-", "", row.names(metatableCLR))
META2<- sample_data(metatable)
psCLR <- phyloseq(OTU2,TAX,META2)
ordu = ordinate(psCLR, "PCoA", "euclidean")
p_full<-plot_ordination(psCLR, ordu, color="Depth", shape="filter")+theme_bw() +scale_color_manual(values=c(ap[8], ap[12], ap[5], ap[4])) + scale_shape_discrete(labels=c( "0.2", "10.0")) +
geom_point(size=3) +theme(text = element_text(size=16)) + ggtitle("A. Full Protist Communities")
p_full

There is clustering by sample depth: DCM, Surface, and Mid/Bottom. In the DCM and Surface, there is also clustering by filter type.

2.3.1.1 PCoA by depth layer

to better observe effect of filter pore-size

Surface

physeqSurf<- subset_samples(psCLR, Depth == "Surface")
OTUsSurf <- otu_table(physeqSurf)
metaSurf <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsSurf),]
adonis2(vegdist(OTUsSurf, method = "euclidean") ~ filter, data = metaSurf)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = vegdist(OTUsSurf, method = "euclidean") ~ filter, data = metaSurf)
##          Df SumOfSqs      R2      F Pr(>F)
## filter    1     3112 0.04279 1.2516  0.203
## Residual 28    69632 0.95721              
## Total    29    72745 1.00000
ordu = ordinate(physeqSurf, "PCoA", "euclidean")
pSurf_full<-plot_ordination(physeqSurf, ordu, color = "filter")+theme_bw() +scale_color_manual(values = c("darkgrey", Cv[5] ))+ geom_point(size=2) +theme(text = element_text(size=16)) #+ geom_vline(xintercept=0, linetype="dashed", color = "black")

pSurf_full

SCM

physeqSCM<- subset_samples(psCLR, Depth == "SCM")
OTUsSCM <- otu_table(physeqSCM)
metaSCM <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsSCM),]
adonis2(vegdist(OTUsSCM, method = "euclidean") ~ filter, data = metaSCM)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = vegdist(OTUsSCM, method = "euclidean") ~ filter, data = metaSCM)
##          Df SumOfSqs      R2      F Pr(>F)
## filter    1     2630 0.01457 0.7096   0.91
## Residual 48   177939 0.98543              
## Total    49   180569 1.00000
ordu = ordinate(physeqSCM, "PCoA", "euclidean")
pSCM_full<-plot_ordination(physeqSCM, ordu, color = "filter")+theme_bw() +scale_color_manual(values = c("darkgrey", Cv[5] ))+ geom_point(size=2) +theme(text = element_text(size=16))# + geom_vline(xintercept=0, linetype="dashed", color = "black")

pSCM_full

Mid

physeqMid<- subset_samples(psCLR, Depth == "Mid")
OTUsMid <- otu_table(physeqMid)
metaMid <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsMid),]
adonis2(vegdist(OTUsMid, method = "euclidean") ~ filter, data = metaMid)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = vegdist(OTUsMid, method = "euclidean") ~ filter, data = metaMid)
##          Df SumOfSqs      R2      F Pr(>F)
## filter    1     2418 0.02254 1.1067  0.246
## Residual 48   104884 0.97746              
## Total    49   107302 1.00000
ordu = ordinate(physeqMid, "PCoA", "euclidean")
pMid_full<-plot_ordination(physeqMid, ordu, color = "filter")+theme_bw() +scale_color_manual(values = c("darkgrey", Cv[5] ))+ geom_point(size=2) +theme(text = element_text(size=16)) #+ geom_hline(yintercept=0, linetype="dashed", color = "black")

pMid_full

Bottom

physeqBot<- subset_samples(psCLR, Depth == "Bottom")
OTUsBot <- otu_table(physeqBot)
metaBot <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsBot),]
adonis2(vegdist(OTUsBot, method = "euclidean") ~ filter, data = metaBot)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = vegdist(OTUsBot, method = "euclidean") ~ filter, data = metaBot)
##          Df SumOfSqs      R2      F Pr(>F)
## filter    1     2255 0.01729 0.8972  0.687
## Residual 51   128207 0.98271              
## Total    52   130462 1.00000
ordu = ordinate(physeqBot, "PCoA", "euclidean")
pBot_full<-plot_ordination(physeqBot, ordu, color = "filter")+theme_bw() +scale_color_manual(values = c("darkgrey", Cv[5] ))+ geom_point(size=2) +theme(text = element_text(size=16)) 
pBot_full

supp1<-ggarrange(pSurf_full, pSCM_full, pMid_full, pBot_full, common.legend = TRUE, legend = "none")
#supp1

2.3.2 Acantharian Communities

library("CoDaSeq")
psAcan <- subset_taxa(ps, D3 =="Acantharea")
OTU4clr<- data.frame(t(data.frame(otu_table(psAcan))))
row.names(OTU4clr) <- gsub("\\.", "", row.names(OTU4clr))
OTUs.clr <- codaSeq.clr(OTU4clr + 0.5, samples.by.row=TRUE)
OTU2 <- otu_table(as.matrix(OTUs.clr), taxa_are_rows = FALSE)
metatableCLR <-metatable
row.names(metatableCLR) <- gsub("-", "", row.names(metatableCLR))
META2<- sample_data(metatable)
psCLR <- phyloseq(OTU2,TAX,META2)
ordu = ordinate(psCLR, "PCoA", "euclidean")
p_Acanth<-plot_ordination(psCLR, ordu, color="Depth", shape="filter")+theme_bw() +scale_color_manual(values=c(ap[8], ap[12], ap[5], ap[4])) + scale_shape_discrete(labels=c( "0.2", "10.0")) +
geom_point(size=3) +theme(text = element_text(size=16))  + ggtitle("B. Acantharian Communities")
p_Acanth

There is clustering by sample depth: SCM, Surface, and Mid/Bottom.

2.3.2.1 PCoA by depth layer

Surface

physeqSurf<- subset_samples(psCLR, Depth == "Surface")
OTUsSurf <- otu_table(physeqSurf)
metaSurf <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsSurf),]
adonis2(vegdist(OTUsSurf, method = "euclidean") ~ filter, data = metaSurf)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = vegdist(OTUsSurf, method = "euclidean") ~ filter, data = metaSurf)
##          Df SumOfSqs      R2      F Pr(>F)
## filter    1    34.85 0.03285 0.9511  0.479
## Residual 28  1025.93 0.96715              
## Total    29  1060.78 1.00000
ordu = ordinate(physeqSurf, "PCoA", "euclidean")
pSurf_As<-plot_ordination(physeqSurf, ordu, color = "filter")+theme_bw() +scale_color_manual(values =  c("darkgrey", Cv[5] ))+ geom_point(size=2) +theme(text = element_text(size=16))

pSurf_As

SCM

physeqSCM<- subset_samples(psCLR, Depth == "SCM")
OTUsSCM <- otu_table(physeqSCM)
metaSCM <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsSCM),]
adonis2(vegdist(OTUsSCM, method = "euclidean") ~ filter, data = metaSCM)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = vegdist(OTUsSCM, method = "euclidean") ~ filter, data = metaSCM)
##          Df SumOfSqs      R2      F Pr(>F)
## filter    1     88.2 0.01092 0.5301  0.989
## Residual 48   7986.6 0.98908              
## Total    49   8074.8 1.00000
ordu = ordinate(physeqSCM, "PCoA", "euclidean")
pSCM_As<-plot_ordination(physeqSCM, ordu, color ="filter")+theme_bw() +scale_color_manual(values =  c("darkgrey", Cv[5] ))+ geom_point(size=2) +theme(text = element_text(size=16)) 

pSCM_As

Mid

physeqMid<- subset_samples(psCLR, Depth == "Mid")
OTUsMid <- otu_table(physeqMid)
metaMid <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsMid),]
adonis2(vegdist(OTUsMid, method = "euclidean") ~ filter, data = metaMid)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = vegdist(OTUsMid, method = "euclidean") ~ filter, data = metaMid)
##          Df SumOfSqs      R2      F Pr(>F)
## filter    1    138.1 0.01857 0.9084  0.494
## Residual 48   7295.9 0.98143              
## Total    49   7434.0 1.00000
ordu = ordinate(physeqMid, "PCoA", "euclidean")
pMid_As<-plot_ordination(physeqMid, ordu, color = "filter")+theme_bw() +scale_color_manual(values =  c("darkgrey", Cv[5] ))+ geom_point(size=2) +theme(text = element_text(size=16)) 

pMid_As

Bottom

physeqBot<- subset_samples(psCLR, Depth == "Bottom")
OTUsBot <- otu_table(physeqBot)
metaBot <- metatableCLR[row.names(metatableCLR) %in% row.names(OTUsBot),]
adonis2(vegdist(OTUsBot, method = "euclidean") ~ filter, data = metaBot)
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = vegdist(OTUsBot, method = "euclidean") ~ filter, data = metaBot)
##          Df SumOfSqs      R2     F Pr(>F)
## filter    1    136.9 0.01535 0.795  0.749
## Residual 51   8780.4 0.98465             
## Total    52   8917.2 1.00000
ordu = ordinate(physeqBot, "PCoA", "euclidean")
pBot_As<-plot_ordination(physeqBot, ordu, color = "filter")+theme_bw() +scale_color_manual(values =  c("darkgrey", Cv[5] ))+ geom_point(size=2) +theme(text = element_text(size=16))

pBot_As

supp_As<-ggarrange(pSurf_As, pSCM_As, pMid_As, pBot_As, common.legend = TRUE, legend = "none")
#supp_As
supp2<-ggarrange(p_full, p_Acanth, common.legend = TRUE, legend = "bottom")
#supp2

2.4 Relative Abundance Plot

Merge Samples –> collapse replicates

Prepare metadata table:

metatable <- read.csv("sampleMap.csv")
row.names(metatable) <- metatable[[1]]
metatable$desc <- paste(metatable$Station, metatable$Depth, metatable$filter, sep="-")
metatable <- metatable[metatable$SampleID %ni% mixedup,]
metatable$Depth <- as.character(metatable$Depth)
metatable$Station <-as.character(metatable$Station)
metatable$filter<-as.character(metatable$filter)
metatable<- metatable[,-which(names(metatable) == "SampleID")]
META<- sample_data(metatable)

ps<- subset_samples(ps, SampleID %ni% mixedup)
ps<-subset_taxa(ps, D1 != "")
ps<-subset_taxa(ps, D1 != "NA")
ps<-subset_taxa(ps, D2 != "Metazoa")

ps2<- ps
sample_data(ps2) <- META

Merge:

mergedps <- merge_samples(ps2, "desc")
## Warning in asMethod(object): NAs introduced by coercion

## Warning in asMethod(object): NAs introduced by coercion
print(mergedps)
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 19574 taxa and 112 samples ]
## sample_data() Sample Data:       [ 112 samples by 32 sample variables ]
## tax_table()   Taxonomy Table:    [ 19574 taxa by 8 taxonomic ranks ]

number of samples reduced from 211 to 112

Fix meta table in phyloseq object:

meta2<-as.data.frame(sample_data(mergedps))
split<- do.call(rbind, strsplit(row.names(meta2), "-"))
meta2$Depth <- split[,2]
meta2$filter<-split[,3]
meta2$desc <- row.names(meta2)
meta2$Station <- as.character(meta2$Station)
meta2$Depth_f <- factor(meta2$Depth, levels=c('Surface','SCM','Mid','Bottom'))
meta2$Station_f <- factor(meta2$Station, levels=c("11", "10", "9", "8", "3", "4", "2", "5", "12", "13", "14", "15", "17", "18"))
META <-sample_data(meta2)
sample_data(mergedps)<-META

Create phyloseq object with acantharian ASVs only:

phA<-subset_taxa(mergedps, D3 =="Acantharea") 
phAra<- transform_sample_counts(phA, function(OTU) 100* OTU/sum(OTU))
print(phAra)
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1053 taxa and 112 samples ]
## sample_data() Sample Data:       [ 112 samples by 34 sample variables ]
## tax_table()   Taxonomy Table:    [ 1053 taxa by 8 taxonomic ranks ]

Relative abundance plot:

taxabarplot<-plot_bar(phAra, x= "Station_f", fill = "D4", facet_grid=filter~Depth_f) + scale_y_continuous(expand = c(0, 0)) + ggtitle("") + scale_fill_manual(values=wpcolor) + theme(legend.title=element_blank()) + geom_bar(aes( fill=D4), stat="identity", position="stack") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +theme_classic() +theme(text = element_text(size=14)) +ylab("Relative Abundance (%)") + xlab ("Station")

t2<-taxabarplot + theme(legend.position="bottom")
t2

#ggsave("/Users/brisbin/Desktop/MiraiDPI/Acantharea_TaxaBarPlot.pdf", width = 12, height =7)

The relative abundance plot shows that the two filters for each sample are similar at all four water depths. This is different than the pattern seen for the full community, where the samples clustered by filter type in the surface and SCM (and are visibly different in relative abundance plots).

The surface samples are dominated by photosymbiotic acantharian sequences (Arth/Symph).

The SCM sees the addition of Acantharea-Group-II and Group VI and slightly more Chaunacanthida.

Mid and Bottom samples are dominated by Group-I, but still has the others as well, including photosymbiotic clades.

Change order of clades in plot and group Acantharea_X with NA

a_ra <- psmelt(phAra)

a_ra$D4<- as.character(a_ra$D4)

a_ra$D4[is.na(a_ra$D4)] <- "Acantharea_X"

wpcolor<- c(Dj[2], Dj[4], Cv[2], Cv[4], Cv[5], Gb[2])

a_ra$D4<- factor(a_ra$D4, levels= c("Acantharea_X", "Acantharea-Group-I", "Acantharea-Group-II" , "Acantharea-Group-VI" , "Chaunacanthida", "Arthracanthida-Symphyacanthida" ))

a_ra$Station <- factor(a_ra$Station, levels=c("11", "10", "3", "9","4", "8", "2", "5", "12", "14", "15", "13", "17", "18"))

t2<-ggplot(a_ra, aes(x=Station, y=Abundance, fill=D4, )) + geom_bar(stat = "identity") +facet_grid(filter~Depth_f)+ scale_y_continuous(expand = c(0, 0)) +theme_classic() + scale_fill_manual(values=wpcolor) +theme(text = element_text(size=14)) +ylab("Relative Abundance (%)") + xlab ("Station")

t2<-t2 + theme(legend.position="bottom")
t2

#ggsave("/Users/brisbin/Desktop/MiraiDPI/Acantharea_TaxaBarPlot_RO.pdf", width = 11, height =6)

Take a closer look at the Chaunacanthida & Arthracanthida/Symphycanthida relative abundance comapared to eachother

phA<-subset_taxa(mergedps, D4 %in% c("Arthracanthida-Symphyacanthida", "Chaunacanthida") ) 
phAra<- transform_sample_counts(phA, function(OTU) 100* OTU/sum(OTU))

a_ra <- psmelt(phAra)

t3<-plot_bar(phAra, x= "Station_f", fill = "D4", facet_grid=filter~Depth_f) + scale_y_continuous(expand = c(0, 0)) + ggtitle("") + scale_fill_manual(values=c(wpcolor[6],wpcolor[5])) + theme(legend.title=element_blank()) + geom_bar(aes( fill=D4), stat="identity", position="stack") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +theme_classic() +theme(text = element_text(size=14)) +ylab("Relative Abundance (%)") + xlab ("Station")

t3<-t3 + theme(legend.position="bottom")
t3

a 10.0 filter and a 0.2 filter in mid-water has only Chaunacanthida, but station 3 only has arth/symph in 0.2 filter from bottom water

overall - does not seem to be a pattern as to whether C or A/S is more abundant in the deeper samples

Relative abundance of Arthracanthida/Symphyacanthida seqs

phA<-subset_taxa(mergedps, D4 == "Arthracanthida-Symphyacanthida" ) 

oOTUs <- data.frame(otu_table(phA))

phAra<- transform_sample_counts(phA, function(OTU) 100* OTU/sum(OTU))

a_ra <- psmelt(phAra)

t3<-plot_bar(phAra, x= "Station_f", fill = "D6", facet_grid=filter~Depth_f) + scale_y_continuous(expand = c(0, 0)) + ggtitle("") + theme(legend.title=element_blank()) + geom_bar(aes( fill=D6), stat="identity", position="stack") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +theme_classic() +theme(text = element_text(size=14)) +ylab("Relative Abundance (%)") + xlab ("Station")

t3<-t3 + theme(legend.position="bottom")
t3
## Warning: Removed 448 rows containing missing values (position_stack).

## Warning: Removed 448 rows containing missing values (position_stack).

Really interesting pattern in the sequences classified to genus level:

  • Acanthostaurs seems to be more abundant in surface and SCM, but present primarily in smaller fractions in deeper samples

  • Phyllostaurus seems to be only present in deeper samples, on both filters, but mostly on the larger filter.

  • Most of the ASVs are not classified to genus level (yellow and grey)

2.5 Are individual Arth-Symph ASVs present in all depth layers?

Merge by depth: Merge:

mergedpsD <- merge_samples(phA, "Depth")
## Warning in asMethod(object): NAs introduced by coercion

## Warning in asMethod(object): NAs introduced by coercion
print(mergedpsD)
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 224 taxa and 4 samples ]
## sample_data() Sample Data:       [ 4 samples by 34 sample variables ]
## tax_table()   Taxonomy Table:    [ 224 taxa by 8 taxonomic ranks ]

number of samples reduced from 211 to 4

a_ra <- psmelt(mergedpsD)

a_ra1<- a_ra %>% filter(Abundance > 0) %>%  mutate(D6= na_if(D6, "Arthracanthida-Symphyacanthida_XX"))

a_ra1$D6 <- as.character(a_ra1$D6)
a_ra2 <- a_ra1 %>% replace_na(list(D6 = "XXXX")) %>% arrange(D6)


#reverse!
a_ra2$OTU <- factor(a_ra2$OTU, levels = rev(unique(a_ra2$OTU )))

#change x axis order

bubble_plot <- ggplot(a_ra2,aes(Sample, OTU)) +
    geom_point(aes(size=Abundance, fill=D6),shape=21,color="black") +
    theme(panel.grid.major=element_line(linetype=1,color="grey"),
          axis.text.x=element_text(angle=90,hjust=1,vjust=0),
          panel.background = element_blank()) +
    ylab("ASVs") +
    xlab("Depth") +
    scale_fill_brewer(palette="Paired", name="Genus") +
    scale_size(name = "Read\ncounts")+ theme(axis.text.y=element_blank())

bubble_plot

Yes! 32 out of 224 individual Arth/Symph ASVs are present in surface/SCM and mid/bottom.

The more abundant ASVs were often found in deep and shallow samples.

This makes sense … if they are more abundant, it is more likely that they are caught in multiple 5 L samples.

Looking at the graph - only Phyllostaurus seems to be predominantly in aphotic zone despite having high abundance (although 3 ASVs also in shallow samples)

2.6 Acantharian Sequence Percentage scatter plots (for comparison with imaging results)

2.6.1 All Acantharians

unfiltered samples - percent acantharean in each 10 um filter - plot as points by depth

phyloseq<-qza_to_phyloseq(features="~/desktop/MiraiFilters/mergedtable.qza")
## Warning in read_qza(features): Artifact was not generated with Qiime2 2018.4,
## if data is not successfully imported, please report here github.com/jbisanz/
## qiime2R/issues
metatable <- read.csv("~/desktop/MiraiFilters/sampleMap.csv")
row.names(metatable) <- metatable[[1]]

metatable$Station <- as.character(metatable$Station)
metatable$filter <- as.character(metatable$filter)

META_all<- sample_data(metatable)

ps = merge_phyloseq(phyloseq, TAX, META_all)

ps<- subset_samples(ps, SampleID %ni% mixedup)

ps<-subset_taxa(ps, D1 != "")
ps<-subset_taxa(ps, D1 != "NA")
ps<-subset_taxa(ps, D2 != "Metazoa")

psD3 = tax_glom(ps, "D3")
psD3ra <- transform_sample_counts(psD3, function(OTU) 100* OTU/sum(OTU))
psD3ra_Aonly<- subset_taxa(psD3ra, D3 =="Acantharea") 
psD3ra_Aonly_10<-subset_samples(psD3ra_Aonly, filter == "10")
psD3ra_Aonly_10_otus <-data.frame(t(data.frame(otu_table(psD3ra_Aonly_10))))
names(psD3ra_Aonly_10_otus) <- c("acanth")
metatable_percent_plot <- metatable
row.names(metatable_percent_plot) <- gsub("-", ".", row.names(metatable_percent_plot))
psD3ra_Aonly_10_otus_meta <- merge(psD3ra_Aonly_10_otus, metatable_percent_plot, by = "row.names")

check to see if linear regression assumptions are met:

depthLM <- lm(acanth ~ m, data = psD3ra_Aonly_10_otus_meta)
par(mfrow = c(2, 2))
plot(depthLM)

Residuals are not normally distributed

Homoscedasticity doesn’t look terrible, but..

# Evaluate homoscedasticity
# non-constant error variance test
ncvTest(depthLM)
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 7.595509, Df = 1, p = 0.0058514

Not homoscedastic

–> use local curve fitting (loess)

percent_depth <- ggplot(psD3ra_Aonly_10_otus_meta, aes(x=m, y = acanth)) + geom_point() + theme_bw() +theme(text = element_text(size=19)) + xlab("Depth (m)") +ylab("% Acantharian Sequences") +ggtitle("A.") +
  geom_smooth(span =1, color = "#11638C") + theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
percent_depth 
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Correlation Coefficient

Check if data are normally distributed (if normally distributed, use parametric Pearson coefficient; if not, use non-paranetric Spearman coefficient)

ggdensity(psD3ra_Aonly_10_otus_meta$acanth, 
          main = "Density plot of acanth percentage",
          xlab = "acanth percentage")

ggqqplot(psD3ra_Aonly_10_otus_meta$acanth)

shapiro.test(psD3ra_Aonly_10_otus_meta$acanth)
## 
##  Shapiro-Wilk normality test
## 
## data:  psD3ra_Aonly_10_otus_meta$acanth
## W = 0.88623, p-value = 1.422e-07

if p-value > 0.05 data are normally distributed

this p value indicates that the data distribution is significantly different than normal distribution

assumptions for Pearsons (parametric) - both values are normally distributed, linearity, and homoscedasticity, no outliers

Spearman does not require normally distributed data (non-parametric) –> Use Spearman

df_for_corr <- psD3ra_Aonly_10_otus_meta %>% dplyr::select(acanth, m)
rcorr(as.matrix(df_for_corr), type = "spearman")
##        acanth    m
## acanth   1.00 0.59
## m        0.59 1.00
## 
## n= 108 
## 
## 
## P
##        acanth m 
## acanth         0
## m       0

Percent contribution of acantharians to the full community is significantly positively correlated to depth (increased % with increased depth).

2.6.2 Only photosymbiotic acantharian clades

psD4 = tax_glom(ps, "D4")
psD4ra <- transform_sample_counts(psD4, function(OTU) 100* OTU/sum(OTU))
psD4ra_Aonly<- subset_taxa(psD4ra, D4 %in% c("Arthracanthida-Symphyacanthida", "Chaunacanthida")) 
psD4ra_Aonly_10<-subset_samples(psD4ra_Aonly, filter == "10")
psD4ra_Aonly_10_otus <-data.frame(t(data.frame(otu_table(psD4ra_Aonly_10))))
psD4ra_Aonly_10_otus$acanth <- rowSums(psD4ra_Aonly_10_otus)
metatable_percent_plot <- metatable
row.names(metatable_percent_plot) <- gsub("-", ".", row.names(metatable_percent_plot))
psD4ra_Aonly_10_otus_meta <- merge(psD4ra_Aonly_10_otus, metatable_percent_plot, by = "row.names")

check to see if linear regression assumptions are met:

depthLM <- lm(acanth ~ m, data = psD4ra_Aonly_10_otus_meta)
par(mfrow = c(2, 2))
plot(depthLM)

Residuals are not normally distributed

doesn’t look homoscedastic …

# Evaluate homoscedasticity
# non-constant error variance test
ncvTest(depthLM)
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 8.223355, Df = 1, p = 0.0041355

Not homoscedastic

–> use local curve fitting (loess)

percent_depth_photo <- ggplot(psD4ra_Aonly_10_otus_meta, aes(x=m, y = acanth)) + geom_point() + theme_bw() +theme(text = element_text(size=19)) + xlab("Depth (m)") +ylab("% Arthracanthida-Symphyacanthida-Chaunacanthida") +ggtitle("B.") +
  geom_smooth(span =1, color = Cv[5]) + theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())

percent_depth_photo
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Correlation Coefficient

Check if data are normally distributed (if normally distributed, use parametric Pearson coefficient; if not, use non-paranetric Spearman coefficient)

ggdensity(psD4ra_Aonly_10_otus_meta$acanth, 
          main = "Density plot of acanth percentage",
          xlab = "acanth percentage")

ggqqplot(psD4ra_Aonly_10_otus_meta$acanth)

shapiro.test(psD4ra_Aonly_10_otus_meta$acanth)
## 
##  Shapiro-Wilk normality test
## 
## data:  psD4ra_Aonly_10_otus_meta$acanth
## W = 0.78083, p-value = 2.138e-11

NOT normal –> use Spearman correlation coefficient

dffor_corr <- psD4ra_Aonly_10_otus_meta %>% dplyr::select(acanth, m)
rcorr(as.matrix(dffor_corr), type = "spearman")
##        acanth     m
## acanth   1.00 -0.61
## m       -0.61  1.00
## 
## n= 108 
## 
## 
## P
##        acanth m 
## acanth         0
## m       0

Percent contribution of photosymbiotic acantharians to the full community is significantly negatively correlated to depth (decreased % with increased depth).

ggarrange(percent_depth, percent_depth_photo)

#ggsave("loess_figs1.png", width = 14, height = 7)

3 High-throughput imaging analysis

Acantharian ROIs and full raw images are archived on Zenodo (doi: 10.5281/zenodo.3605400).

Load additional packages:

library(readr)
library(reshape2)
#library("plyr", lib.loc="/Library/Frameworks/R.framework/Versions/3.4/Resources/library")

3.1 Acantharian abundance by depth

Station 17

Load and format CTD data:

ctd17 <- read.csv("~/Desktop/MiraiDPI/DPI_CTD_data/6KCDT0006/CTD/6KCDT0006_CTD_data_1sec_int.csv")
ctd17 <- ctd17[,1:4]
names(ctd17) <- c("Time", "Temp", "Salinity", "Depth")
ctd17$Depth <- ctd17$Depth * -1
ctd17$station <- 17
ctd17$Time <- gsub(":", "", ctd17$Time)
ctd17$Time2 <- paste0('201706090', ctd17$Time)
st17aCTD <- ctd17[,c(4,6)]
names(st17aCTD)<-c("Depth","photos")
str(st17aCTD)
## 'data.frame':    9576 obs. of  2 variables:
##  $ Depth : num  -12.6 -12.9 -13.1 -13.3 -13.3 ...
##  $ photos: chr  "20170609085210" "20170609085211" "20170609085212" "20170609085213" ...

Load and format image data:

st17as<- read.table("~/Desktop/MiraiDPI/st17DPI/st17rois.txt", header = FALSE)
names(st17as) <- c("photos")
st17as$photos <- strtrim(st17as$photos, 14)
st17as$photos <- as.numeric(st17as$photos)
## Warning: NAs introduced by coercion

Add depth from CTD data to images:

st17acanth <- merge(st17aCTD,st17as, by = "photos" )

Calculate the sampling volume (in order to get cells/L): Load names of all images from station 17 downcast:

st17pics<- read.table("~/desktop/MiraiDPI/st17DPI/st17ALLphotos.txt", header = TRUE)

st17pics$photos <- strtrim(st17pics$photos, 14)
st17pics$photos <- as.numeric(st17pics$photos)
## Warning: NAs introduced by coercion
st17pics <- merge(st17aCTD,st17pics, by = "photos" )

Get dataframe of the number of photos by depth at 10m intervals:

br = seq( -800, 0, by = 10)
ranges = br[-1]
freq   = hist(st17pics$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)

P<- data.frame(depth = ranges, photofrequency = freq$counts)
str(P)
## 'data.frame':    80 obs. of  2 variables:
##  $ depth         : num  -790 -780 -770 -760 -750 -740 -730 -720 -710 -700 ...
##  $ photofrequency: int  0 0 215 26 26 25 26 26 26 7 ...

How many photos total?

sum(P$photofrequency)
## [1] 2453

Get same data frame for Acantharian frequency:

br = seq( -800, 0, by = 10)
ranges = br[-1]
freq   = hist(st17acanth$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)

A<- data.frame(depth = ranges, Acanthfrequency = freq$counts)
str(A)
## 'data.frame':    80 obs. of  2 variables:
##  $ depth          : num  -790 -780 -770 -760 -750 -740 -730 -720 -710 -700 ...
##  $ Acanthfrequency: int  0 0 2 0 0 0 0 0 0 0 ...

How many acanths total?

sum(A$Acanthfrequency)
## [1] 237

Divide the acantharian frequency by photo frequency and convert to cells/L.

For stations 3 and 10 the camera and led were 15.4cm apart. For stations 15 and 17, the camera and LED were 14 cm apart.

5.5 x 4.6 x 15.4 = 389.62 cubic cm = 0.38962 L at stations 3 & 10
5.5 x 4.6 x 14 = 354.2 cubic cm = 0.3542 L at stations 15 & 17

station 17 volume calculation:

frequencyDF17 <- merge(A, P, by = "depth")[-1,]
frequencyDF17$conc <- frequencyDF17$Acanthfrequency / frequencyDF17$photofrequency
frequencyDF17$perLiter <- frequencyDF17$conc / 0.3542
frequencyDF17$Station <- "st.17"
frequencyDF17$pos_depth <- frequencyDF17$depth * -1

Station 3

Load and format CTD data:

ctd3 <- read.csv("~/Desktop/MiraiDPI/DPI_CTD_data/6KCDT0003/CTD/6KCDT0003_CTD_data_1sec_int.csv")
ctd3 <- ctd3[,1:4]
names(ctd3) <- c("Time", "Temp", "Salinity", "Depth")
ctd3$Depth <- ctd3$Depth * -1
ctd3$station <- 3
ctd3$Time <- gsub(":", "", ctd3$Time)
ctd3$Time2 <- paste0('201705310', ctd3$Time)
st3aCTD <- ctd3[,c(4,6)]
names(st3aCTD)<-c("Depth","photos")
str(st3aCTD)
## 'data.frame':    6584 obs. of  2 variables:
##  $ Depth : num  -7.39 -7.28 -7.21 -7.22 -7.26 ...
##  $ photos: chr  "20170531074319" "20170531074320" "20170531074321" "20170531074322" ...

Load and format image data:

st3as<- read.table("~/Desktop/MiraiDPI/st3DPI/st3rois.txt", header = FALSE)
names(st3as) <- c("photos")
st3as$photos <- strtrim(st3as$photos, 14)
st3as$photos <- as.numeric(st3as$photos)
## Warning: NAs introduced by coercion
st3acanth <- merge(st3aCTD,st3as, by = "photos" )

Calculate the sampling volume (in order to get cells/L): Load names of all images from station 3 downcast:

st3pics<- read.table("~/desktop/MiraiDPI/st3DPI/st3ALLphotos.txt", header = TRUE)
st3pics$photos <- strtrim(st3pics$photos, 14)
st3pics$photos <- as.numeric(st3pics$photos)
st3pics <- merge(st3aCTD,st3pics, by = "photos" )
st3pics <- st3pics[,-3]
names(st3pics)<- c("photos", "Depth")
st3pics$photos <- as.character(st3pics$photos)

Get data frame of image frequency by depth in 10m intervals:

br = seq( -1020, 0, by = 10)
ranges = br[-1]
freq   = hist(st3pics$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)

P3<- data.frame(depth = ranges, photofrequency = freq$counts)
str(P3)
## 'data.frame':    102 obs. of  2 variables:
##  $ depth         : num  -1010 -1000 -990 -980 -970 -960 -950 -940 -930 -920 ...
##  $ photofrequency: int  0 109 38 39 40 24 33 38 39 38 ...

How many photos total?

sum(P3$photofrequency)
## [1] 4010

Get same data frame for Acantharian frequency:

br = seq( -1020, 0, by = 10)
ranges = br[-1]
freq   = hist(st3acanth$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)

A3<- data.frame(depth = ranges, Acanthfrequency = freq$counts)
str(A3)
## 'data.frame':    102 obs. of  2 variables:
##  $ depth          : num  -1010 -1000 -990 -980 -970 -960 -950 -940 -930 -920 ...
##  $ Acanthfrequency: int  0 0 0 0 0 0 0 0 0 0 ...

How many acanths total?

sum(A3$Acanthfrequency)
## [1] 115

Divide the acantharian frequency by photo frequency and convert to cells/L. And, plot cells/L by depth:

frequencyDF3 <- merge(A3, P3, by = "depth")[-1,]
frequencyDF3$conc <- frequencyDF3$Acanthfrequency / frequencyDF3$photofrequency
frequencyDF3$perLiter <- frequencyDF3$conc / 0.38962
frequencyDF3$pos_depth <- frequencyDF3$depth * -1
frequencyDF3$Station <- "st.3"

Station 10

Load and format CTD data:

ctd10 <- read.csv("~/Desktop/MiraiDPI/DPI_CTD_data/6KCDT0004/CTD/6KCDT0004_CTD_data_1sec_int.csv")
ctd10 <- ctd10[,1:4]
names(ctd10) <- c("Time", "Temp", "Salinity", "Depth")
ctd10$Depth <- ctd10$Depth * -1
ctd10$station <- 10
ctd10$Time <- gsub(":", "", ctd10$Time)
ctd10$Time2 <- paste0('201706020', ctd10$Time)
st10aCTD <- ctd10[,c(4,6)]
names(st10aCTD)<-c("Depth","photos")

Load and format image data:

st10aNames<- c("photos")
st10as<- read.table("~/Desktop/MiraiDPI/st10DPI/st10rois.txt", col.names = st10aNames)
st10as$photos <- strtrim(st10as$photos, 14)
st10acanth <- merge(st10aCTD,st10as, by = "photos" )

Calculate the sampling volume (in order to get cells/L): Load names of all images from station 3 downcast:

st10pics<- read.table("~/desktop/MiraiDPI/st10DPI/st10ALLphotos.txt", col.names = st10aNames)
st10pics$photos <- strtrim(st10pics$photos, 14)
st10pics$photos <- as.numeric(st10pics$photos)
st10pics <- merge(st10aCTD,st10pics, by = "photos" )
st10pics <- st10pics[,-3]
names(st10pics)<- c("photos", "Depth")
st10pics$photos <- as.character(st10pics$photos)

Get data frame of image frequency by depth in 10m intervals:

br = seq( -1020, 0, by = 10)
ranges = br[-1]
freq   = hist(st10pics$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)

P10<- data.frame(depth = ranges, photofrequency = freq$counts)
str(P10)
## 'data.frame':    102 obs. of  2 variables:
##  $ depth         : num  -1010 -1000 -990 -980 -970 -960 -950 -940 -930 -920 ...
##  $ photofrequency: int  0 6 51 33 32 33 32 32 33 33 ...

How many photos total?

sum(P10$photofrequency)
## [1] 3639

Get same data frame for Acantharian frequency:

br = seq( -1020, 0, by = 10)
ranges = br[-1]
freq   = hist(st10acanth$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)

A10<- data.frame(depth = ranges, Acanthfrequency = freq$counts)
str(A10)
## 'data.frame':    102 obs. of  2 variables:
##  $ depth          : num  -1010 -1000 -990 -980 -970 -960 -950 -940 -930 -920 ...
##  $ Acanthfrequency: int  0 0 1 0 0 0 0 0 0 4 ...
sum(A10$Acanthfrequency)
## [1] 359

Divide the acantharian frequency by photo frequency and convert to cells/L. And, plot cells/L by depth:

frequencyDF10 <- merge(A10, P10, by = "depth")[-1,]
frequencyDF10$conc <- frequencyDF10$Acanthfrequency / frequencyDF10$photofrequency
frequencyDF10$perLiter <- frequencyDF10$conc / 0.38962
frequencyDF10$Station <- "st.10"
frequencyDF10$pos_depth <- frequencyDF10$depth * -1

Station 15

Load and format CTD data:

ctd15 <- read.csv("~/Desktop/MiraiDPI/DPI_CTD_data/6KCDT0005/CTD/6KCDT0005_CTD_data_1sec_int.csv")
ctd15 <- ctd15[,1:4]
names(ctd15) <- c("Time", "Temp", "Salinity", "Depth")
ctd15$Depth <- ctd15$Depth * -1
ctd15$station <- 10
ctd15$Time <- gsub(":", "", ctd15$Time)

ctd15$Time2 <- paste0('20170606', ctd15$Time)
st15aCTD <- ctd15[,c(4,6)]
names(st15aCTD)<-c("Depth","photos")

st15as<- read.table("~/Desktop/MiraiDPI/st15DPI/st15rois.txt", col.names = st10aNames)
st15as$photos <- strtrim(st15as$photos, 14)

st15acanth <- merge(st15aCTD,st15as, by = "photos" )

Calculate the sampling volume (in order to get cells/L): Load names of all images from station 3 downcast:

st15pics<- read.table("~/desktop/MiraiDPI/st15DPI/st15ALLphotos.txt", col.names = st10aNames)
st15pics$photos <- strtrim(st15pics$photos, 14)
st15pics$photos <- as.numeric(st15pics$photos)
## Warning: NAs introduced by coercion
st15pics <- merge(st15aCTD,st15pics, by = "photos" )
st15pics <- st15pics[,-3]
names(st15pics)<- c("photos", "Depth")
st15pics$photos <- as.character(st15pics$photos)

Get data frame of image frequency by depth in 10m intervals:

br = seq( -780, 0, by = 10)
ranges = br[-1]
freq   = hist(st15pics$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)

P15<- data.frame(depth = ranges, photofrequency = freq$counts)
str(P15)
## 'data.frame':    78 obs. of  2 variables:
##  $ depth         : num  -770 -760 -750 -740 -730 -720 -710 -700 -690 -680 ...
##  $ photofrequency: int  38 50 49 45 42 39 42 41 39 37 ...

How many photos total?

sum(P15$photofrequency)
## [1] 3056

Get same data frame for Acantharian frequency:

br = seq( -780, 0, by = 10)
ranges = br[-1]
freq   = hist(st15acanth$Depth, breaks=br, include.lowest=TRUE, plot=FALSE)

A15<- data.frame(depth = ranges, Acanthfrequency = freq$counts)
str(A15)
## 'data.frame':    78 obs. of  2 variables:
##  $ depth          : num  -770 -760 -750 -740 -730 -720 -710 -700 -690 -680 ...
##  $ Acanthfrequency: int  1 1 0 3 2 4 2 1 5 3 ...
sum(A15$Acanthfrequency)
## [1] 524

Divide the acantharian frequency by photo frequency and convert to cells/L. And, plot cells/L by depth:

frequencyDF15 <- merge(A15, P15, by = "depth")[-1,]
frequencyDF15$conc <- frequencyDF15$Acanthfrequency / frequencyDF15$photofrequency
frequencyDF15$perLiter <- frequencyDF15$conc / 0.3542
frequencyDF15$pos_depth <- frequencyDF15$depth * -1
frequencyDF15$Station <- "st.15"

frequencyDF_all<- rbind(frequencyDF10, frequencyDF15, frequencyDF17, frequencyDF3)

frequencyDF_all$Station <- factor(frequencyDF_all$Station, levels = c("st.10", "st.3", "st.15", "st.17"))

Check assumptions for linear regression:

frequencyDF_all$logdepth = log(frequencyDF_all$pos_depth+1)
frequencyDF_all$logconc =  log(frequencyDF_all$perLiter+1)

depth_conc_LM <- lm(logdepth ~ logconc, data = frequencyDF_all)

par(mfrow = c(2, 2))
plot(depth_conc_LM)

Residuals are normally distributed, but not homoscedastic:

# Evaluate homoscedasticity
# non-constant error variance test
ncvTest(depth_conc_LM)
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 81.69586, Df = 1, p = < 2.22e-16

Use Loess and correlation coefficient:

frequencyDF_all<- rbind(frequencyDF10, frequencyDF15, frequencyDF17, frequencyDF3)

frequencyDF_all$Station <- factor(frequencyDF_all$Station, levels = c("st.10", "st.3", "st.15", "st.17"))

conc_depth_all <- ggplot(frequencyDF_all, aes(x=pos_depth, y = perLiter)) + geom_point() + theme_bw() +theme(text = element_text(size=14)) + xlab("") +ylab("Imaged acantharian cells (per Liter)") +ggtitle("") + geom_smooth(color = "#0E899F")+  theme(axis.text.y = element_text(angle = 90, hjust = 1))+ theme(axis.text.x = element_text(angle = 90, hjust = 1)) + theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())

conc_depth_all

maximum concentration and depth at each station:

frequencyDF_all %>% dplyr::select(perLiter, depth, Station) %>% group_by(Station) %>% top_n(n=1, wt = perLiter) %>% arrange(Station)
## # A tibble: 4 x 3
## # Groups:   Station [4]
##   perLiter depth Station
##      <dbl> <dbl> <fct>  
## 1    1.81    -50 st.10  
## 2    0.921   -30 st.3   
## 3    4.71      0 st.15  
## 4    2.82      0 st.17

Data are not normally distributed:

ggqqplot(frequencyDF_all$perLiter)
## Warning: Removed 2 rows containing non-finite values (stat_qq).
## Warning: Removed 2 rows containing non-finite values (stat_qq_line).

## Warning: Removed 2 rows containing non-finite values (stat_qq_line).

shapiro.test(frequencyDF_all$perLiter)
## 
##  Shapiro-Wilk normality test
## 
## data:  frequencyDF_all$perLiter
## W = 0.38755, p-value < 2.2e-16

Use Spearman Correlation Coefficient:

df_for_corr <- frequencyDF_all %>% dplyr::select(perLiter, pos_depth)
library(Hmisc)
rcorr(as.matrix(df_for_corr), type = "spearman")
##           perLiter pos_depth
## perLiter      1.00     -0.55
## pos_depth    -0.55      1.00
## 
## n
##           perLiter pos_depth
## perLiter       356       356
## pos_depth      356       358
## 
## P
##           perLiter pos_depth
## perLiter            0       
## pos_depth  0

Cell concentration is significantly negatively correlated with depth

3.2 Correlation between image conc and sequence % abundance

frequencyDF_surf <- frequencyDF_all %>% dplyr::filter(pos_depth < 50) %>% dplyr::filter(!is.na(perLiter)) %>% dplyr::group_by(Station)%>% dplyr::summarize(Conc= mean(perLiter)) 
frequencyDF_surf$Depth<- "Surface"
frequencyDF_SCM <- frequencyDF_all %>% dplyr::filter(pos_depth > 50 & pos_depth < 150) %>% dplyr::filter(!is.na(perLiter)) %>% dplyr::group_by(Station) %>% dplyr::summarize(Conc= mean(perLiter))
frequencyDF_SCM$Depth <-"SCM"
frequencyDF_Mid <- frequencyDF_all %>% dplyr::filter(pos_depth > 150 & pos_depth < 700) %>% dplyr::filter(!is.na(perLiter))%>% dplyr::group_by(Station) %>% dplyr::summarize(Conc= mean(perLiter))
frequencyDF_Mid$Depth <- "Mid"
frequencyDF_Bot<- frequencyDF_all%>% dplyr::filter(pos_depth > 700) %>% dplyr::filter(!is.na(perLiter)) %>% dplyr::group_by(Station) %>%  dplyr::summarize(Conc= mean(perLiter))
frequencyDF_Bot$Depth <- "Bottom"
frequencyBYlayer<-rbind(frequencyDF_surf, frequencyDF_SCM, frequencyDF_Mid, frequencyDF_Bot)

mergedps merged replicates so there is one sample per combination of variables

psD4 = tax_glom(mergedps, "D4")
psD4ra <- transform_sample_counts(psD4, function(OTU) 100* OTU/sum(OTU))
psD4ra_Aonly<- subset_taxa(psD4ra, D4 %in% c("Arthracanthida-Symphyacanthida", "Chaunacanthida")) 
psD4ra_Aonly_10<-subset_samples(psD4ra_Aonly, filter == "10")
psD4ra_Aonly_10_otus <-data.frame(otu_table(psD4ra_Aonly_10))
psD4ra_Aonly_10_otus$acanth <- rowSums(psD4ra_Aonly_10_otus)


metatable <- read.csv("sampleMap.csv")
row.names(metatable) <- metatable[[1]]
metatable$desc <- paste(metatable$Station, metatable$Depth, metatable$filter, sep="-")
metatable <- metatable[metatable$SampleID %ni% mixedup,]
metatable$Depth <- as.character(metatable$Depth)
metatable$Station <-as.character(metatable$Station)
metatable$filter<-as.character(metatable$filter)
metatable<- metatable[,-which(names(metatable) == "SampleID")]


metatable_percent_plot <- metatable %>% distinct(desc, .keep_all = TRUE) %>% filter(filter== "10")

row.names(metatable_percent_plot) <- metatable_percent_plot$desc
psD4ra_Aonly_10_otus_meta <- merge(psD4ra_Aonly_10_otus, metatable_percent_plot, by = "row.names")

DPIstations <- psD4ra_Aonly_10_otus_meta %>% dplyr::select(acanth, Station, Depth)

DPIstations$Station <- paste0('st.', DPIstations$Station)

final <- merge(DPIstations, frequencyBYlayer)

plot count by percent

final2 <- final %>% filter(Conc < 2 & acanth < 5) # remove two outliers

countBYperc2 <- ggplot(final2, aes(x=Conc, y = acanth)) + geom_point() + theme_bw() +theme(text = element_text(size=19)) + xlab("Imaged acantharian cells (per L)") +ylab("% Chaunacanthida-Arthracanthida-Symphyacanthida Sequences") +
  geom_smooth(method = lm, color = Cv[5]) +theme( axis.title.y = element_text(size = 17)) + theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank())
countBYperc2
## `geom_smooth()` using formula 'y ~ x'

ggsave("countbypercent_noO.png", width = 8, height = 8)
## `geom_smooth()` using formula 'y ~ x'
concByperc <- lm(acanth ~ Conc, data = final2)

par(mfrow = c(2, 2))
plot(concByperc )

Residuals are normally distributed

# Evaluate homoscedasticity
# non-constant error variance test
ncvTest(concByperc )
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 0.9092953, Df = 1, p = 0.3403

Homoscedastic!

summary(concByperc)
## 
## Call:
## lm(formula = acanth ~ Conc, data = final2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8867 -0.5956 -0.3601  0.1804  3.3026 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   1.0217     0.3639   2.808   0.0158 *
## Conc          1.5803     0.6186   2.555   0.0252 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.114 on 12 degrees of freedom
## Multiple R-squared:  0.3523, Adjusted R-squared:  0.2983 
## F-statistic: 6.527 on 1 and 12 DF,  p-value: 0.02524

Data are not normally distributed:

shapiro.test(final2$acanth)
## 
##  Shapiro-Wilk normality test
## 
## data:  final2$acanth
## W = 0.85758, p-value = 0.0282
ggqqplot(final2$acanth)

shapiro.test(final2$Conc)
## 
##  Shapiro-Wilk normality test
## 
## data:  final2$Conc
## W = 0.69015, p-value = 0.0002927
ggqqplot(final2$Conc)

Spearman Correlation:

df_for_corr <- final2 %>% dplyr::select(acanth, Conc)
library(Hmisc)
rcorr(as.matrix(df_for_corr), type = "spearman")
##        acanth Conc
## acanth   1.00 0.75
## Conc     0.75 1.00
## 
## n= 14 
## 
## 
## P
##        acanth Conc 
## acanth        0.002
## Conc   0.002

3.3 Acantharian cell size distribution by depth

cropped all acantharian photos to touch the edges of the acantharian spines. Then got image size in pixels with: file ./* > imagesize.csv

entire image is 2448 x 2050 pixels 5.5 cm accross (55,000 µm) 55,000 / 2448 = 22.5

Station 17

Load and format data:

imagesize17 <- read.csv("~/Desktop/MiraiDPI/st17DPI/st17imagesize.csv", header = FALSE)
imagesize17$V1 <- substring(imagesize17$V1, 3, 16)
split<- do.call(rbind, strsplit(as.character(imagesize17$V2), " "))
imagesize17$height <- as.numeric(split[,2])
imagesize17$width <- as.numeric(split[,4])
imagesize17<- imagesize17[,c(1,5,6)]
names(imagesize17) <- c("photos", "height", "width")
imagesizedepth17<- merge(st17aCTD,imagesize17, by = "photos" )
imagesizedepth17$height <- (imagesizedepth17$height * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth17$width <- (imagesizedepth17$width * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth17$size <- imagesizedepth17$height * imagesizedepth17$width
imagesizedepth17$pos_depth <- imagesizedepth17$Depth * -1
imagesizedepth17$Station <- "st.17"
str(imagesizedepth17)
## 'data.frame':    237 obs. of  7 variables:
##  $ photos   : chr  "20170609085210" "20170609085215" "20170609085215" "20170609085220" ...
##  $ Depth    : num  -12.6 -13.2 -13.2 -13.1 -13.4 ...
##  $ height   : num  0.54 0.562 0.652 0.652 0.517 ...
##  $ width    : num  0.585 0.585 0.608 0.608 0.472 ...
##  $ size     : num  0.316 0.329 0.396 0.396 0.245 ...
##  $ pos_depth: num  12.6 13.2 13.2 13.1 13.4 ...
##  $ Station  : chr  "st.17" "st.17" "st.17" "st.17" ...

Station 3

Load and format data:

imagesize3 <- read.csv("~/Desktop/MiraiDPI/st3DPI/st3imagesize.csv", header = FALSE)
imagesize3$V1 <- substring(imagesize3$V1, 3, 16)
split<- do.call(rbind, strsplit(as.character(imagesize3$V2), " "))
imagesize3$height <- as.numeric(split[,2])
imagesize3$width <- as.numeric(split[,4])
imagesize3<- imagesize3[,c(1,5,6)]
names(imagesize3) <- c("photos", "height", "width")
imagesizedepth3<- merge(st3aCTD,imagesize3, by = "photos" )
imagesizedepth3$height <- (imagesizedepth3$height * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth3$width <- (imagesizedepth3$width * 22.5)/1000 #22.5um per pixel  /1000 to convert um to mm
imagesizedepth3$size <- imagesizedepth3$height * imagesizedepth3$width
imagesizedepth3$pos_depth <- imagesizedepth3$Depth * -1
imagesizedepth3$Station <- "st.3"
str(imagesizedepth3)
## 'data.frame':    115 obs. of  7 variables:
##  $ photos   : chr  "20170531074321" "20170531074322" "20170531074340" "20170531074342" ...
##  $ Depth    : num  -7.21 -7.22 -7.09 -7.24 -7.24 ...
##  $ height   : num  1.035 0.765 0.45 0.765 0.45 ...
##  $ width    : num  0.922 0.36 0.472 0.652 0.472 ...
##  $ size     : num  0.955 0.275 0.213 0.499 0.213 ...
##  $ pos_depth: num  7.21 7.22 7.09 7.24 7.24 ...
##  $ Station  : chr  "st.3" "st.3" "st.3" "st.3" ...

Station 10

Load and format data:

imagesize10 <- read.csv("~/Desktop/MiraiDPI/st10DPI/st10imagesize.csv", header = FALSE)
imagesize10$V1 <- substring(imagesize10$V1, 3, 16)
split<- do.call(rbind, strsplit(as.character(imagesize10$V2), " "))
imagesize10$height <- as.numeric(split[,2])
imagesize10$width <- as.numeric(split[,4])
imagesize10<- imagesize10[,c(1,5,6)]
names(imagesize10) <- c("photos", "height", "width")
imagesizedepth10<- merge(st10aCTD,imagesize10, by = "photos" )
imagesizedepth10$height <- (imagesizedepth10$height * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth10$width <- (imagesizedepth10$width * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth10$size <- imagesizedepth10$height * imagesizedepth10$width
imagesizedepth10$pos_depth <- imagesizedepth10$Depth * -1
imagesizedepth10$Station <- "st.10"
str(imagesizedepth10)
## 'data.frame':    359 obs. of  7 variables:
##  $ photos   : chr  "20170602061002" "20170602061004" "20170602061007" "20170602061008" ...
##  $ Depth    : num  -18.9 -19.4 -18.8 -18.5 -18.5 ...
##  $ height   : num  1.103 0.315 0.495 1.058 0.765 ...
##  $ width    : num  1.08 0.405 1.035 0.945 0.81 ...
##  $ size     : num  1.191 0.128 0.512 0.999 0.62 ...
##  $ pos_depth: num  18.9 19.4 18.8 18.5 18.5 ...
##  $ Station  : chr  "st.10" "st.10" "st.10" "st.10" ...

Station 15

Load and format data:

imagesize15 <- read.csv("~/Desktop/MiraiDPI/st15DPI/st15imagesize.csv", header = FALSE)
imagesize15$V1 <- substring(imagesize15$V1, 3, 16)
split<- do.call(rbind, strsplit(as.character(imagesize15$V2), " "))
imagesize15$height <- as.numeric(split[,2])
imagesize15$width <- as.numeric(split[,4])
imagesize15<- imagesize15[,c(1,5,6)]
names(imagesize15) <- c("photos", "height", "width")
imagesizedepth15<- merge(st15aCTD,imagesize15, by = "photos" )
imagesizedepth15$height <- (imagesizedepth15$height * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth15$width <- (imagesizedepth15$width * 22.5)/1000 #22.5um per pixel /1000 to convert um to mm
imagesizedepth15$size <- imagesizedepth15$height * imagesizedepth15$width
imagesizedepth15$pos_depth <- imagesizedepth15$Depth * -1
imagesizedepth15$Station <- "st.15"
str(imagesizedepth15)
## 'data.frame':    524 obs. of  7 variables:
##  $ photos   : chr  "20170606150930" "20170606150930" "20170606150931" "20170606150933" ...
##  $ Depth    : num  -12.2 -12.2 -11.8 -11.7 -11.7 ...
##  $ height   : num  1.282 0.9 0.652 1.597 1.89 ...
##  $ width    : num  1.215 0.585 0.63 0.81 1.823 ...
##  $ size     : num  1.558 0.526 0.411 1.294 3.445 ...
##  $ pos_depth: num  12.2 12.2 11.8 11.7 11.7 ...
##  $ Station  : chr  "st.15" "st.15" "st.15" "st.15" ...
imagesizedepth_all <- rbind(imagesizedepth10, imagesizedepth15,imagesizedepth17, imagesizedepth3)
imagesizedepth_all$Station <- factor(imagesizedepth_all$Station, levels = c("st.10", "st.3", "st.15", "st.17"))
depthsize_all<- ggplot(imagesizedepth_all, aes(x=pos_depth, y = size)) + geom_point() +  theme_bw() +theme(text = element_text(size=12)) + xlab("") +ylab(expression(Acantharian~ROI~area~(mm^2)))+ theme(axis.text.y = element_text(angle = 90, hjust = 1))+ theme(axis.text.x = element_text(angle = 90, hjust = 1)) 
depthsize_all

p <-imagesizedepth_all %>% dplyr::select(pos_depth, size) %>%
  
  # Add a new column called 'bin': cut the initial 'carat' in bins
  mutate( bin=cut_width(pos_depth, width=10, boundary=5) ) %>%
  
  # plot
  ggplot( aes(x=bin, y=size) ) +
    geom_boxplot(fill="#69b3a2", coef=100) +
    xlab("Depth") + theme_bw()+  theme(axis.text.x = element_text(angle = 90, hjust = 1))

p

but X axis is not continuous - bins with without any data are skipped

library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following objects are masked from 'package:Hmisc':
## 
##     is.discrete, summarize
## The following object is masked from 'package:ggpubr':
## 
##     mutate
## The following objects are masked from 'package:reshape':
## 
##     rename, round_any
## The following object is masked from 'package:matrixStats':
## 
##     count
## The following object is masked from 'package:IRanges':
## 
##     desc
## The following object is masked from 'package:S4Vectors':
## 
##     rename
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following object is masked from 'package:purrr':
## 
##     compact
sizemediandf <- imagesizedepth_all %>% dplyr::select(pos_depth, size)

cutdataframe <- ddply(sizemediandf, .(cut(sizemediandf$pos_depth, 100)), colwise(mean))
cut_max <- ddply(sizemediandf, .(cut(sizemediandf$pos_depth, 100)), colwise(max))
cut_min <- ddply(sizemediandf, .(cut(sizemediandf$pos_depth, 100)), colwise(min))

names(cutdataframe) <- c("Depth", "meanSize")
names(cut_max)<- c("Depth", "maxSize")
names(cut_min) <- c("Depth", "minSize")

cut2<- merge(cutdataframe, cut_max, by =  "Depth")
cut3<- merge(cut2, cut_min, by =  "Depth")

cut3$depth1 <- substr(as.character(cut3$Depth), 2, 6)
cut3$depth1<- as.numeric(sapply(strsplit(cut3$depth1, ","), "[", 1))

Are linear regression assumptions met?

meanSizelm<- lm(meanSize ~ depth1, data = cut3)
par(mfrow = c(2, 2))
plot(meanSizelm)

Residuals are close to normally distributed (only 3 points far from straight line), looks homoscedastic

# Evaluate homoscedasticity
# non-constant error variance test
ncvTest(meanSizelm)
## Non-constant Variance Score Test 
## Variance formula: ~ fitted.values 
## Chisquare = 2.277725, Df = 1, p = 0.13124

it is homoscedastic

according to central limit theorem, probably could use a linear regression because sample size is large (82 > 30)

for consistency, will use loess and correlation coefficient:

depthsize_all<- ggplot(cut3, aes(x=depth1, y = meanSize)) + geom_point() +  theme_bw() +theme(text = element_text(size=14)) + xlab("") +ylab(expression(Acantharian~ROI~area~(mm^2))) +  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + geom_pointrange(aes(ymin= minSize , ymax=maxSize), alpha = 0.3)+ theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + theme(axis.text.y = element_text(angle = 90, hjust = 1)) + geom_smooth(span =1 ,color = "#0E899F") + scale_y_continuous(expand = c(0, 0))

depthsize_all
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#ggsave("size_by_depth.png", width =14, height = 7)

Correlation Coefficient

is data normally distributed?

ggqqplot(cut3$meanSize)

shapiro.test(cut3$meanSize)
## 
##  Shapiro-Wilk normality test
## 
## data:  cut3$meanSize
## W = 0.48787, p-value = 2e-15

NO …

–> use Spearman Correlation Coefficient

df_for_corr <- cut3 %>% dplyr::select(meanSize, depth1)
rcorr(as.matrix(df_for_corr), type = "spearman")
##          meanSize depth1
## meanSize     1.00  -0.55
## depth1      -0.55   1.00
## 
## n= 82 
## 
## 
## P
##          meanSize depth1
## meanSize           0    
## depth1    0

4 Session Info

sessionInfo()
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS  10.14.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] plyr_1.8.5                  reshape2_1.4.3             
##  [3] CoDaSeq_0.99.2              car_3.0-5                  
##  [5] carData_3.0-3               zCompositions_1.3.3-1      
##  [7] truncnorm_1.0-8             NADA_1.6-1                 
##  [9] MASS_7.3-51.5               ALDEx2_1.14.1              
## [11] Hmisc_4.3-0                 Formula_1.2-3              
## [13] survival_3.1-8              ggpubr_0.2.4               
## [15] magrittr_1.5                gridExtra_2.3              
## [17] wesanderson_0.3.6.9000      reshape_0.8.8              
## [19] DESeq2_1.22.2               SummarizedExperiment_1.12.0
## [21] DelayedArray_0.8.0          BiocParallel_1.16.6        
## [23] matrixStats_0.55.0          Biobase_2.42.0             
## [25] GenomicRanges_1.34.0        GenomeInfoDb_1.18.2        
## [27] IRanges_2.16.0              S4Vectors_0.20.1           
## [29] BiocGenerics_0.28.0         RColorBrewer_1.1-2         
## [31] vegan_2.5-6                 lattice_0.20-38            
## [33] permute_0.9-5               phyloseq_1.26.1            
## [35] qiime2R_0.99.1              forcats_0.4.0              
## [37] stringr_1.4.0               dplyr_0.8.3                
## [39] purrr_0.3.3                 readr_1.3.1                
## [41] tidyr_1.0.0                 tibble_3.0.1               
## [43] ggplot2_3.3.0               tidyverse_1.3.0            
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1           backports_1.1.5        igraph_1.2.4.2        
##   [4] splines_3.5.0          digest_0.6.23          foreach_1.4.7         
##   [7] htmltools_0.4.0        fansi_0.4.0            checkmate_1.9.4       
##  [10] memoise_1.1.0          cluster_2.1.0          openxlsx_4.1.4        
##  [13] Biostrings_2.50.2      annotate_1.60.1        modelr_0.1.5          
##  [16] colorspace_1.4-1       blob_1.2.0             rvest_0.3.5           
##  [19] haven_2.2.0            xfun_0.11              crayon_1.3.4          
##  [22] RCurl_1.95-4.12        jsonlite_1.6           genefilter_1.64.0     
##  [25] iterators_1.0.12       ape_5.3                glue_1.3.1            
##  [28] gtable_0.3.0           zlibbioc_1.28.0        XVector_0.22.0        
##  [31] Rhdf5lib_1.4.3         abind_1.4-5            scales_1.1.0          
##  [34] DBI_1.0.0              Rcpp_1.0.3             xtable_1.8-4          
##  [37] htmlTable_1.13.3       foreign_0.8-72         bit_1.1-14            
##  [40] htmlwidgets_1.5.1      httr_1.4.1             acepack_1.4.1         
##  [43] ellipsis_0.3.0         farver_2.0.1           pkgconfig_2.0.3       
##  [46] XML_3.98-1.20          nnet_7.3-12            dbplyr_1.4.2          
##  [49] utf8_1.1.4             locfit_1.5-9.1         labeling_0.3          
##  [52] tidyselect_0.2.5       rlang_0.4.5            AnnotationDbi_1.44.0  
##  [55] munsell_0.5.0          cellranger_1.1.0       tools_3.5.0           
##  [58] cli_2.0.0              generics_0.0.2         RSQLite_2.1.4         
##  [61] ade4_1.7-13            broom_0.5.6            evaluate_0.14         
##  [64] biomformat_1.10.1      yaml_2.2.0             knitr_1.26            
##  [67] bit64_0.9-7            fs_1.3.1               zip_2.0.4             
##  [70] nlme_3.1-140           xml2_1.2.2             compiler_3.5.0        
##  [73] rstudioapi_0.10        curl_4.3               ggsignif_0.6.0        
##  [76] reprex_0.3.0           geneplotter_1.60.0     stringi_1.4.3         
##  [79] Matrix_1.2-18          multtest_2.38.0        vctrs_0.2.4           
##  [82] pillar_1.4.3           lifecycle_0.2.0        cowplot_1.0.0         
##  [85] data.table_1.12.8      bitops_1.0-6           R6_2.4.1              
##  [88] latticeExtra_0.6-28    rio_0.5.16             codetools_0.2-16      
##  [91] assertthat_0.2.1       rhdf5_2.26.2           withr_2.1.2           
##  [94] GenomeInfoDbData_1.2.0 mgcv_1.8-31            hms_0.5.2             
##  [97] grid_3.5.0             rpart_4.1-15           rmarkdown_1.18        
## [100] lubridate_1.7.4        base64enc_0.1-3